Integrate Automated QDQ placement tool - part 3.3#839
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThese changes introduce a command-line interface and core workflow orchestration for ONNX Q/DQ autotuning. The CLI entry point parses configuration arguments, validates inputs, initializes TensorRT benchmarking, and invokes a region-pattern autotuning workflow that profiles models, applies quantization schemes, benchmarks performance, and exports optimized variants. Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as CLI (run_autotune)
participant Validator as Input Validator
participant Benchmark as Benchmark Init
participant Workflow as Autotuning Workflow
participant Model as ONNX Model
participant TensorRT as TensorRT Engine
participant Output as Model Export
User->>CLI: Invoke with arguments
CLI->>Validator: Validate model & baseline paths
Validator-->>CLI: Path valid / exit
CLI->>Benchmark: Initialize benchmark instance
Benchmark->>TensorRT: Configure with timing cache & plugins
TensorRT-->>Benchmark: Instance ready
Benchmark-->>CLI: Benchmark initialized
CLI->>Workflow: Invoke region_pattern_autotuning_workflow
Workflow->>Model: Load ONNX model
Workflow->>Model: Load pattern cache & QDQ baseline
Workflow->>Workflow: Profile regions & apply node filters
loop For each region
Workflow->>Workflow: Generate quantization schemes
Workflow->>Model: Apply Q/DQ to region
Workflow->>TensorRT: Benchmark model
TensorRT-->>Workflow: Latency result
end
Workflow->>Output: Export optimized model
Output-->>Workflow: Export complete
Workflow->>Output: Save state checkpoint
Output-->>Workflow: State saved
Workflow-->>CLI: Return autotuner result
CLI-->>User: Exit with status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@modelopt/onnx/quantization/autotune/__main__.py`:
- Around line 107-116: init_benchmark_instance can return None on failure but
the current flow continues; update the caller (the block after
log_benchmark_config) to check the return value of init_benchmark_instance (when
called with use_trtexec=args.use_trtexec,
plugin_libraries=args.plugin_libraries, timing_cache_file=args.timing_cache,
warmup_runs=args.warmup_runs, timing_runs=args.timing_runs,
trtexec_args=trtexec_args) and if it returns None, log an error and exit early
(e.g., sys.exit(1)) so the script fails fast instead of producing misleading
infinite benchmark results.
In `@modelopt/onnx/quantization/autotune/workflows.py`:
- Around line 239-246: The Config instantiation currently hardcodes verbose=True
which forces noisy logging; change the call that constructs Config (the
Config(...) in this file) to accept a verbose parameter (e.g., verbose=verbose
or verbose=args_verbose) and thread that boolean from the CLI invocation that
creates/starts the autotuner (update the CLI call site to pass args.verbose into
the function that triggers this code), ensuring logger.info stays unchanged but
Config uses the provided verbose flag instead of True.
09e136a to
e3ad6da
Compare
|
Please add a test for workflows. Example: https://github.com/gcunhase/TensorRT-Model-Optimizer/blob/85228103a29662c721d862cb1cec38b0193699f5/tests/unit/onnx/quantization/autotune/test_workflows.py#L36 |
88b34a4 to
ebc6087
Compare
Added, please check |
|
/ok to test 0414b81 |
0414b81 to
1aa4818
Compare
|
@willg-nv, I'm seeing the following errors in the |
8f7fe19 to
95b4a5e
Compare
|
@willg-nv the precommit fixes are working, thank you! One last thing is the |
a2e3016 to
4bf8fbd
Compare
done |
There was a problem hiding this comment.
Pull request overview
This PR implements the command-line interface (CLI) for the ONNX Q/DQ autotuning framework, completing part 3.3 of the automated QDQ placement tool integration. The PR builds upon the benchmark module (PR #837) and QDQAutotuner class (PR #838), providing a complete end-to-end workflow for automated quantization optimization of ONNX models using pattern-based region analysis and TensorRT performance measurement.
Changes:
- Added CLI (
__main__.py) with comprehensive argument parsing for model paths, quantization parameters, TensorRT benchmarking configuration, and workflow control - Implemented high-level workflow orchestration (
workflows.py) managing pattern-based region optimization, state persistence, baseline comparison, and benchmarking - Extended common data structures with
PatternSchemes,PatternCache, andConfigclasses for managing quantization schemes, caching patterns, and configuration
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
modelopt/onnx/quantization/autotune/__main__.py |
CLI implementation with argument parsing, input validation, and workflow invocation |
modelopt/onnx/quantization/autotune/workflows.py |
Workflow functions for benchmark initialization, pattern-based autotuning, and region filtering |
modelopt/onnx/quantization/autotune/common.py |
Extended with PatternSchemes, PatternCache, and Config dataclasses for scheme management and serialization |
tests/unit/onnx/quantization/autotune/test_config.py |
Unit tests for Config class default values, custom values, and parameter validation |
tests/gpu/onnx/quantization/autotune/test_workflow.py |
GPU test for quantized model export with Q/DQ insertion |
tests/_test_utils/onnx/quantization/autotune/models.py |
Test helper for creating simple ONNX models for autotuner testing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
## What does this PR do? This PR integrates benchmark module to QDQ autotunner. This benchamrk module is used to evaluate ONNX model perf. This PR is 1/3 of #703. Once all small PRs are merged #703 could be closed. PR 3.1: #837 PR 3.2 #838 PR 3.3: #839 ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: No, document will be added in part 4. - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No, change log will be updated when all changes are merged. ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added ONNX quantization autotuning capabilities with a consolidated module providing streamlined import paths for core components. * Introduced unified benchmarking framework supporting TensorRT-based model evaluation with both command-line and Python API implementations. * Added support for timing cache persistence, custom plugin libraries, shape validation, and dynamic input shape configuration for flexible model testing and optimization. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Will Guo <[email protected]>
8a363da to
bee717a
Compare
Review: PR #839 - Integrate Automated QDQ placement tool - part 3.3Overall AssessmentThis PR implements the CLI and high-level workflows for QDQ autotuning, completing the autotuner package. The code is well-structured with comprehensive CLI arguments, good workflow orchestration, and thorough tests. Several minor issues need addressing before merge. ✅ What's Good
🚨 Critical IssuesNone identified — code is production-ready after addressing medium priority items.
|
| File | Changes | Purpose |
|---|---|---|
__main__.py |
+302 | CLI entry point with argparse |
workflows.py |
+376 | High-level autotuning workflow |
common.py |
+551 | PatternSchemes, PatternCache, Config |
test_workflow.py |
+82 | GPU integration tests |
test_config.py |
+97 | Unit tests for Config |
models.py |
+4 | Update test model dimensions |
✅ Recommendations
Before merge:
- ✅ Fix
strip()usage → useremovesuffix(".onnx") - ✅ Add missing
default_dq_dtypeto Config docstring - ✅ Fix docstring indentation in Config class
Nice to have:
4. Consider tempfile.TemporaryDirectory() in tests
5. Consider clarifying the list modification logic in add_pattern_schemes
Overall: This PR is well-implemented and necessary for the QDQ autotuning CLI. The high-level workflow abstraction is clean, and the CLI design is user-friendly with good examples and help text.
@willg-nv could you take a look at 2,3 and the simplication suggestion? |
🔴 Security Issue IdentifiedFile: (line ~32) Issue: Unauthorized comment bypassing Bandit security check Per security policy, comments require:
Neither condition is met. Please remove the comment and use a safer path, or obtain approval with justification. Suggested fix: cc: @NVIDIA/modelopt-setup-codeowners |
🔴 Security Issue: Unauthorized
|
false alarm, no file path. |
|
@cjluo-nv all other copilot comments are resolved. |
## What does this PR do? This PR implements QDQAutotuner class. This class is used to drive the main Autotuner workflow. The workflow is: 1. uses RegionSearch to build regions 2. generate QDQ ONNX models and evaluate perf 3. save best model This PR is part 2/4 of #703. PR 3.1: #837 PR 3.2 #838 PR 3.3: #839 **Overview:** ? ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Not in this part. - **Did you add or update any necessary documentation?**: No, document will be updated in part 4. - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No, change log will be updated when all changes are ready. ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Introduced ONNX Q/DQ autotuning framework with automatic region discovery and pattern-based optimization. * Added model profiling and quantization scheme generation capabilities. * Enabled state persistence and quantization model export functionality. * Introduced configuration management for quantization parameters and profiling workflows. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Will Guo <[email protected]>
Signed-off-by: Will Guo <[email protected]>
Signed-off-by: Will Guo <[email protected]>
Signed-off-by: Will Guo <[email protected]>
Signed-off-by: Will Guo <[email protected]>
Signed-off-by: Will Guo <[email protected]>
Signed-off-by: Will Guo <[email protected]>
Signed-off-by: Will Guo <[email protected]>
Signed-off-by: Will Guo <[email protected]>
Signed-off-by: Will Guo <[email protected]>
Signed-off-by: Will Guo <[email protected]>
9ae36c7 to
ef161e8
Compare
|
/ok to test ef161e8 |
Signed-off-by: Will Guo <[email protected]>
Head branch was pushed to by a user without write access
|
/ok to test c8274b8 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #839 +/- ##
==========================================
- Coverage 72.15% 72.05% -0.10%
==========================================
Files 210 210
Lines 23515 23549 +34
==========================================
+ Hits 16967 16968 +1
- Misses 6548 6581 +33 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
This PR implements QDQ autotuner CLI. This is the initial version of CLI, it will be integrated to modelopt.onnx.quantization.autotune.
Usage:
PR 3.1: #837
PR 3.2 #838
PR 3.3: #839
Overview: ?
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.